Architecture露
Data flow model露
An Event is a unit of data that flows through a Flume agent. The Event
flows from Source to Channel to Sink, and is represented by an
implementation of the Event interface. An Event carries a payload (byte
array) that is accompanied by an optional set of headers (string attributes).
A Flume agent is a process (JVM) that hosts the components that allow
Events to flow from an external source to a external destination.
A Source consumes Events having a specific format, and those
Events are delivered to the Source by an external source like a web
server. For example, an AvroSource can be used to receive Avro Events
from clients or from other Flume agents in the flow. When a Source receives
an Event, it stores it into one or more Channels. The Channel is
a passive store that holds the Event until that Event is consumed by a
Sink. One type of Channel available in Flume is the FileChannel
which uses the local filesystem as its backing store. A Sink is responsible
for removing an Event from the Channel and putting it into an external
repository like HDFS (in the case of an HDFSEventSink) or forwarding it to
the Source at the next hop of the flow. The Source and Sink within
the given agent run asynchronously with the Events staged in the
Channel.
Reliability露
An Event is staged in a Flume agent’s Channel. Then it’s the
Sink‘s responsibility to deliver the Event to the next agent or
terminal repository (like HDFS) in the flow. The Sink removes an Event
from the Channel only after the Event is stored into the Channel of
the next agent or stored in the terminal repository. This is how the single-hop
message delivery semantics in Flume provide end-to-end reliability of the flow.
Flume uses a transactional approach to guarantee the reliable delivery of the
Events. The Sources and Sinks encapsulate the
storage/retrieval of the Events in a Transaction provided by the
Channel. This ensures that the set of Events are reliably passed from
point to point in the flow. In the case of a multi-hop flow, the Sink from
the previous hop and the Source of the next hop both have their
Transactions open to ensure that the Event data is safely stored in
the Channel of the next hop.
|